Morphology In Statistical Machine Translation From English To Highly Inflectional Language
نویسندگان
چکیده
In this paper, we investigate the role of morphology in phrase-based statistical machine translation (SMT) from English to the highly inflectional Slovenian language. Translation to an inflectional language is a challenging task because of its morphological complexity. Rich morphology increases data sparsity and worsens the quality of statistical machine translation. The idea of the paper is to find the SMT configuration, based on morpho-syntactic information, with the best translation results, when translating from English to the highly inflectional Slovenian language. To address this issue, we added the morphological information in terms of morpho-syntactic description (MSD) tags that were attached to words. A MSD tag includes all morpho-syntactic information in position-dependent attributes. Tags were attached to words by TreeTagger. Several experiments were performed using MSD tags to improve the translation results. First, factored translation was studied, and different configurations were tested. They show that factored translation improves modeling of short distance collocations. To capture long-distance dependencies in languages, operation sequence models (OSM) were added in the second set of experiments. An additional improvement was obtained. The overall results show that the morpho-syntactic information of inflectional language is an important factor in translation. Factored translation with OSM models brought 9% relative improvement. The most successful configuration was tSaMaL-SaMaL (OSM: 0-0, 1-1, 2-2). The conclusions of our work can be applied to other Slavic languages, as they to some extent share the same morphological characteristics.
منابع مشابه
The Impact of Morphological Errors in Phrase-based Statistical Machine Translation from English and German into Swedish
We have investigated the potential for improvement in target language morphology when translating into Swedish from English and German, by measuring the errors made by a state of the art phrase-based statistical machine translation system. Our results show that there is indeed a performance gap to be filled by better modelling of inflectional morphology and compounding; and that the gap is not ...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملSyntactic Reordering for Arabic- English Phrase-Based Machine Translation
performing translation task which converts text or speech in one Natural Language (Source Language (SL)) into another Natural Language (Target Language (TL)). The translation from Arabic to English is difficult task due to the Arabic languages are highly inflectional, rich morphology and relatively free word order. Word ordering plays an important part in the translation process. The paper prop...
متن کاملUsing POS Information for Statistical Machine Translation into Morphologically Rich Languages
When translating from languages with hardly any inflectional morphology like English into morphologically rich languages, the English word forms often do not contain enough information for producing the correct fullform in the target language. We investigate methods for improving the quality of such translations by making use of part-ofspeech information and maximum entropy modeling. Results fo...
متن کاملEnglish-Latvian SMT: knowledge or data?
In cases when phrase-based statistical machine translation (SMT) is applied to languages with rather free word order and rich morphology, translated texts often are not fluent due to misused inflectional forms and wrong word order between phrases or even inside the phrase. One of possible solutions how to improve translation quality is to apply factored models. The paper presents work on Englis...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- ITC
دوره 47 شماره
صفحات -
تاریخ انتشار 2018